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Abstract — We discuss a method for sparse signal 
approximation, which is based on the correlation of 
the target signal with a pseudo-random signal, and 
uses a modification of the greedy matching pursuit al- 
gorithm. We show that this approach provides an ef- 
ficient encoding-decoding method, which can be used 
also for lossy compression and encryption purposes. 

Keywords: sparse approximation; lossy compression; 
matching pursuit. 

1 Introduction 

Recently there has been an increased interest in aher- 
natives to traditional signal approximation and compres- 
sion techniques. Several new methods for approximat- 
ing signals in dictionaries of waveforms (Fourier, Gabor, 
wavelets, cosine packets etc.) have been proposed [I]- [5]. 
Such a dictionary is a collection of waveforms represented 
by discrete time signals, also called atoms. Using this ap- 
proach, a discrete-time signal of length N is decomposed 
in a sparse linear combination of dictionary atoms with 
corresponding coefficients. Dictionaries can be complete, 
if they contain TV atoms, and respectively overcomplete, 
if they contain more than TV atoms. Most of the dictio- 
naries are obtained by merging complete dictionaries con- 
sisting of different types of waveforms. For typical appli- 
cations the size of these dictionaries is quite big (tens-of- 
thousands of atoms), and raises high computational dif- 
ficulties and memory requirements. The decomposition 
of a signal using overcomplete dictionaries is nonunique, 
since some elements in the dictionary have representa- 
tions in terms of other elements. Therefore, the sparse 
approximation problem in an overcomplete dictionary is 
to find the minimal representation of a signal, in terms of 
dictionary atoms. Finding the sparsest approximation of 
a signal from an arbitrary dictionary is an NP-hard prob- 
lem. Despite of this, several sub-optimal methods have 
been recently developed, such that a wide range of appli- 
cations (coding, source separation, denoising etc.) have 
benefited from the progress made in this area. 

Inspired by the sparse coding paradigm, here we discuss 
a method for signal approximation, which is based on the 
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correlation of the target signal with a pseudo-random sig- 
nal, and uses a modification of the greedy matching pur- 
suit algorithm. We show that this approach provides an 
efficient encoding-decoding method, with good computa- 
tional speed and low memory requirements, which also 
can be used for lossy compression and encryption pur- 
poses. 

2 Sparse approximation problem 

The classical signal approximation problem seeks to rep- 
resent an arbitrary signal by the best approximation us- 
ing a restricted class of signals. The general formulation 
is as follows '6|. Consider H is an A^— dimensional Hilbert 
space (or more generally a Banach space) , with the norm 
defined in terms of the inner product as: 
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and an M-dimensional subspace U C H, M < N, 
spanned by the orthonormal basis: 



,(0) 



(M-l) 



}eu 



1 if i = j 
if i^J ' 

M-l 



VyeU, 3c„, m = 0,...,M-l, y ^ J2 



(2) 



(3) 



(4) 



m=0 



Given a vector x G H, the problem is to find the vector 
y & U such that: 
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According to the best approximation theorem, this prob- 
lem has a unique solution: 
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where 



m = 0,l,...,Af- 1, 
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are the Fourier coefficients. 



The sparse approximation problem is different than the 
classical one, since we do not seek a representation in an 
orthonormal basis, but on a dictionary $ e such that 
span{^) = H [T]-[S]. In general, we consider a countable 
and overcomplete dictionary of functions: 



{(^(")|m = 0, 1,...,M- 
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in which are normalized (H^'*''"'' II2 — l)i but not or- 
thogonal, and possibly redundant. Given x G H, the 
sparse approximation problem consists in finding the 
sparse coefficients vector c G M*^, such that: 



^ = ^'^S mm llcllo, 
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where e > is a small positive constant, and: 



M-l 



(10) 



m=0 



is the £0 norm, measuring the number of nonzero coef- 
ficients. Obviously, when using an overcomplete dictio- 
nary we have more vectors, and thus a better probability 
of finding a small number of vectors that approximate 
well the given vector. However, since the dictionary may 
contain linearly dependent vectors, such an expansion is 
no longer unique. Also, in a lossy compression framework 
the goal is to use as few vectors as possible, in order to 
obtain a good approximation. Unfortunately, this is an 
NP-hard combinatorial optimization problem, since in or- 
der to find the optimal expansion it is necessary to try 
all possible combinations: 
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searching for the smallest collection of K non-zero terms 
which best approximates the signal. 

Several methods have been developed to solve the sparse 
approximation problem. The standard approach is based 
on the convcxification of the objective function, obtained 
by replacing the £0 norm with the £1 norm [l]-[4]: 
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The resulting optimization problem: 
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is known as Basis Pursuit (BP), and it can be solved us- 
ing linear programming techniques whose computational 
complexities are polynomial [1! . However, in most real 
applications the BP approach requires the solution of a 
very large convex, non-quadratic optimization problem. 



and therefore suffers from high computational complex- 
ity. Another approach is based on greedy algorithms, 
which are suboptinial and require far less computations. 
Our goal is not only to obtain a good sparse expansion, 
but also to provide a fast computational method, there- 
fore here we focus our attention on the greedy Match- 
ing Pursuit (MP) algorithm [5], which is the fastest 
known algorithm for the sparse approximation problem. 
Also, since we are interested in developing a compres- 
sion scheme, where only maximum K < N out of M 
dictionary elements can be used in the expansion, we re- 
formulate the problem as following: 



c = arg mm 
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3 Random dictionary 

Previous studies have shown that the sparsity of the ap- 
proximation depends on using an appropriate dictionary 
for the given class of signals [I]- [5]. For example, multi- 
scale decompositions of natural images into discrete co- 
sine or wavelet bases are quasi-sparse. Such decomposi- 
tions have a few significant coefficients, which concentrate 
most of the energy and information. This energy com- 
paction property is then exploited in compression and de- 
noising applications, where the weak coefficients are usu- 
ally discarded. Thus, a method to construct overcomplete 
dictionaries consists by concatenating orthonormal bases 
like: Fourier, Gabor functions, wavelets, cosine packets 
etc. These dictionaries can be improved by employing 
learning methods |7j, which adapt an initial dictionary 
to a set of training samples. In this case the goal is to 
optimize a dictionary, such that a given class of signals, 
has a sparse approximation. 

The sparse decomposition abilities of such dictionaries are 
characterized by the restricted isometry property (RIP) 
of the N X M matrix <t, with the columns given by the 
dictionary atoms ip^"^\ The if -restricted isometry con- 
stant 5k of $ is the smallest quantity such that for every 
if-sparse vector c G M^^ we have [2]- [4]: 
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This means that every set of less than K columns are 
approximately orthogonal. Smaller 6k means better or- 
thogonality, and therefore a better discrimination capa- 
bility of atoms. Recently it has been shown that ran- 
dom matrices satisfy the RIP with high probability 1^- 
[3]. Therefore, some good examples of overcomplete dic- 
tionaries include: 

- matrices of independent and identical distributed (i.i.d) 
Gaussian samples from A^(0, 1); 



- matrices with Bernoulli entries, where ipnm = il with 
equal probability p = 1/2; 

- matrices with randomly sampled Fourier elements etc. 

Inspired by the above results obtained for random ma- 
trices, here we propose a simplified approach. Instead of 
using a large random dictionary, we simply use a random 
vector of length N + M: 

f = [/o, /i, fM-i+Nf e K™, (16) 

generated by a reproducible random process, a pseudo- 
random number generator for example. An atom of this 
dictionary will simply be a normalized "window" vector: 

^(m) _ [fm, /rn+l; fm+N-l]'^ ^ j-^y-j 
/ Ar_i 2 
Y ^n=0 J m+n 

Obviously, there are M such window vectors in any real- 
ization of the pseudo-random process, and they are un- 
correlated since /,„ are i.i.d random variables (depending 
on the quality of the random number generator used). 
Such a vector can be easily generated by encoder and de- 
coder using only a given seed value. This way we avoid 
the high memory and computational requirements, while 
still obtaining a good sparse decomposition, which we 
will show that it can be also efficiently used for lossy 
compression. Without loosing generalization, in order to 
obtain an easy normalization we consider that each fm is 
a Bernoulli random variable. Thus, the normalization is 
easily achieved by simply dividing the vector / with \/N. 
We should mention that the mean (x) of the signal can 
be captured correctly if we assume that the components 
of first atom of the dictionary are all set to 1: 

^(") = -^[l,l,...,lF. (18) 

This dictionary choice also provides a simple encryption 
scheme, since for different seeds one obtains different dic- 
tionaries. Therefore, if the seed is user defined then the 
obtained expansion will also be encrypted, since the same 
secret seed is needed for decoding. 

4 Matching pursuit 

Matching Pursuit is a well known greedy algorithm 
widely used in approximation theory and statistics [5^. 
One of its main features is that it can be applied to arbi- 
trary dictionaries. Starting from an initial approximation 
c — and residual r — x, the algorithm uses an iterative 
greedy strategy to pick the dictionary atoms that are the 
most strongly correlated with the residual. Then, succes- 
sively their contribution is subtracted from the residual, 
which this way can be made arbitrarily small. Using the 
simplified dictionary /, the pseudo-code of the MP algo- 
rithm takes the form listed in Algorithm 1. 



Algorithm 1. Matching Pursuit (MP) 



K] / / number of atoms in the approximation 
c ^ 0; // coefficients of selected atoms 
p 4- 0; // positions of selected atoms 
r // initial residual 
foT{k^O,l,...,K 

for(m = 0,l,...,Af-l){ 

s ^ (r,v3(")); 

if(|s| > |s„iaa;|){ 

i -S— m;}} 
Pk <~ i; 

r -(^ r — Cfc(y9(P'=^;} 
return p, c; 



Thus, at each iteration step A: = 0, 1, K — I the algo- 
rithm selects the index pk of the atom tp^P'' '> , which has the 
highest correlation with the current residual, and updates 
the estimate of the corresponding coefficient Cfc, and the 
residual r. After K selection steps the algorithm returns 
the positions p of the selected atoms in the dictionary 
and their corresponding coefficients c in the expansion. 
A shortcoming of the MP algorithm is that although the 
asymptotic convergence is guaranteed and it can be eas- 
ily proved, the resulting approximation after any finite 
number of steps K < N will in general be suboptimal. 
Thus, one cannot expect an exact reconstruction of the 
target signal after decoding. 

One can see that the decoding step requires both the 
positions p and the coefficients c of the selected atoms in 
order to compute the K-term approximation: 

K-l 
fc=0 

This is inconvenient from the point of view of compres- 
sion. Assuming for example that the elements of the 
input vector x S are floating point numbers repre- 
sented on Q bits, we need NQ bits to store the whole 



vector. Thus, in order to compress x we need to reduce 
this number. For each position we need Q bits, and for 
each coefficient we also need Q bits, therefore the output 
of the MP algorithm requires 2KQ bits. Thus, in order 
to achieve compression we must have K < N/2. This 
condition can be relaxed by imposing that both pk and 
Cfc arc stored together as a single number represented on 
Q bits. Of course, one may think that in this case the 
precision of Ck will be affected, and the approximation 
will deteriorate significantly. However, due to the ran- 
dom characteristic of the dictionary we may expect that 
this may actually work, and at each step the algorithm 
will pick the atom with the best pair (pk, Ck) which can be 
"accommodated" on a number hk represented on Q bits. 
Thus, instead of using 2Q bits to store a pair {pk, Cfe), we 
may actually use only Q bits to store their equivalent hk- 
The question is how to do this efficiently? 

We observe that we need to find the atom characterized 
by a pair (p^, Cfe), where there is a trade-off between the 
necessary precision of Ck and the length of pk, such that 
they can be represented together on Q bits, and their 
inclusion in the approximation expansion decreases the 
residual r. Ideally, we would like to allocate Q/2 bits 
for the position pk, and Q/2 bits for the corresponding 
coefficient Cfc. That means to restrict the length of the 
dictionary to M = 2^^/^. Thus, for a typical integer rep- 
resentation on Q = 32 bits, the dictionary will contain 
M = 2^^ elements. 

Now, let us assume the signal is normalized a; ^ x/ ||a;|| 
before the compression. Since the MP algorithm will al- 
ways produce a residual r with ||r|| < = 1, the re- 
sult of the correlation term s ^ (r, (p'^"^^ ) , will always 
be bounded: s G (—1,1). Thus, the value of a resulted 
coefficient will also be bounded by the same interval: 
Ck S (—1,1). Finally, the idea is to make the position 
Pk equal with the integer part of hk, and the coefficient 
Cfc equal with the fractional part of hk- 

int{hk) ^ Pk, frac{hk) Cfc. (20) 

Obviously, by doing this some of the precision in the 
representation of Cfc will be lost. The compressive MP 
(CMP) algorithm which takes into account these modifi- 
cations is listed in Algorithm 2. 

One can see that the computation of the cross-correlation 
term s requires two extra steps. In the first step, the in- 
dex m of the currently tested atom (^("^ , and its correla- 
tion s e (—1, 1) with the residual, are packet together in 
s, using: 

s ^ sign{s)m + s. (21) 

This will result in a loss of the precision of the fractional 
part of s, since it needs to accommodate also the integer 
part m, on the same number Q of bits. In the second 
step we extract the resulted fractional part using: 

s ^ sign{s){\s\ - \int{s)\), (22) 



in order to perform the comparison with the current max- 
imum value l-s^Tio^ |. If the test is true, then we store the 
obtained values in Smax ^ s, and respectively i ^ m. 
The values Smax and i, corresponding to the the best 
atom are then used to update the residual, and they 
are packed into hk, this time without information loss. 
Thus, after the first packing step, when some precision 
is lost, the future packing- unpacking steps become re- 
versible, and the precision is conserved. 



Algorithm 2. Compressive Matching Pursuit (CMP) 



K; / / number of atoms in the approximation 

ft <— 0; // positions and coefficients of selected atoms 

hx II a; II 2; // signal normalization 

r ^ x/hK', / / initial residual 

for(/e = 0, l,...,i4:- 1){ 

for(TO = 0, 1,...,M-1){ 
s<(- (r,(^('")); 
s sign{s)m + s; 
s sign{s){\s\ — \mt{s)\); 

if(|s| > |s„a2:|){ 
^rnax ^ ^, 

i to;}} 

^ ^ ^ ^max'^^ ^ ', 

hk ^ signi^Smax^'^ ~t~ ^max?} 

return h, s; 



We should also save the norm of x, which is required for 
the decoding step: hx = \\x\\2- Therefore, the length of 
the vector his K+1, where the first K values correspond 
to the positions (the positive integer part) and coefficients 
(the fractional part) of the atoms selected in the approxi- 
mation expansion, while the last value contains the norm 
of the input signal. The decoding procedure is very fast 
and it is done as following: 

K-l 

y^hxYl sign{hk){\hk\ - |mt(/ifc)|)^(l^"*('''=)l) ~ x- 

fe=0 

(23) 
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Figure 1: Relative approximation error e of the MP and 
CMP algorithms. 

5 Numerical results 

We have implemented the MP and CMP algorithms in 
parallel using C, OpenMP and GCC on a Linux platform. 
The parallel OpenMP version is almost n times faster 
than the serial version, where n is the number of available 
CPU cores. 

In the following experiments we consider a random 
Bernoulli dictionary with M = 2^^ elements. Also, in 
order to speed up computation we set the length N of 
the input signal to = 128. A longer input signal can 
be easily divided in chunks of length A^ < 128, which 
can be then independently processed. Ideally, we would 
like to approximate and compress any kind of signal, so 
the shape of the signal doesn't really matter. Therefore, 
for testing we choose random samples drawn from a uni- 
form distribution on the interval (-2^^, 2^^), represented as 
floating point numbers on Q = 32 bits. This, means that 
we practically approximate and compress noise. This 
is usually a difficult task, since such signals will have 
the widest possible bandwidth, for the considered finite 
length A^ of the samples. The quantity of interest is the 
relative recovery error, which is defined as following: 
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where [x] is the signal mean value, and y is the K- 
term approximation expansion. In Figure 1 we give the 
value of e as a function of the inverse of compression 
ratio p""'^ = K/N (linear scale - top, logarithmic scale 
- bottom). The results were averaged over 1000 sam- 
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Figure 2: Decomposition of a random signal using AIP 
and CMP algorithms. 



pies for K = 1,...,A''. One can see that the effect of 
the CMP packing procedure (p, c) ^ h manifests only 
for K/N > 0.5, which is obviously not too bad for lossy 
compression purposes. Also, we have £(0.5) ~ 1%, which 
means that a compression ratio of pc mp —2:1 produces 
a distortion of the data of only 1%. 

In Figure 2 we give a typical sample, and its decom- 
position by the MP and CMP algorithms. Here we 
have represented the values of the coefficients Cfc and the 
positions pk of the corresponding atoms in the dictio- 
nary. In the case of CMP, the positions and the coeffi- 
cients are extracted from hk, using: pk = int{hk), and 
Cfc — sign{hk){\hk\ — \pk\)- The positions and the coef- 
ficients for MP and CMP are different, due to the ex- 
tra packing constraint in the CMP algorithm. Also, we 
should notice the exponential decrease of the magnitude 
of the coefficients Ck- Thus, a lossy compression of the 
signal can be achieved retaining only the first coefficients 
with large values. 

In Figure 3 we have the same signal from Figure 2, and 
its recovery after lossy compression, for several differ- 
ent compression ratios: pcMP = 16 : 1, e = 51.30%; 
PCMP = 8 : 1, £ = 28.25%; pcMP = 4 : 1, £ = 9.14%; 
and respectively pcMP — 2 : 1, e — 1.01%. Similar re- 
sults have been obtained for different types of signals. 
For example, in Figure 4 we consider a smooth signal (a 
superposition of sinusoids) for several compression ratios: 
PCMP = 8 : 1, £ = 27.83%; pcMP = 4 : 1, £ = 8.92%; 
and respectively pcMP = 2 : 1, £ = 0.98%. 
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Figure 3: Lossy compression of a random signal. 
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Conclusion 

We have discussed a sparse random approximation 
method, based on the correlation of the target signal with 
a pseudo-random signal, and a modification of the greedy 
matching pursuit algorithm. We have shown that this ap- 
proach provides an efficient encoding-decoding method. 
Also, the presented method has the advantage of an easy 
implementation, with high computational speed and low 
memory requirements, which also can be used for lossy 
compression and encryption purposes. 
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Figure 4: Lossy compression of a smooth signal. 



